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Abstract 

Background: When an outbreak of a novel pathogen occurs, some of the most pressing questions from a public- 
health point of view relate to its transmissibility, and the probabilities of different clinical outcomes following 
infection, to allow an informed response. Estimates of these quantities are often based on household data due to 
the high potential for transmission in this setting, but typically a rich spectrum of individual-level outcomes (from 
uninfected to serious illness) are simplified to binary data (infected or not). We address the added benefit from 
retaining the heterogeneous outcome information in the case of the 2009-10 influenza pandemic, which posed 
particular problems for estimation of key epidemiological characteristics due to its relatively mild nature and hence 
low case ascertainment rates. 

Methods: We use mathematical models of within-household transmission and case ascertainment, together with 
Bayesian statistics to estimate transmission probabilities stratified by household size, the variability of infectiousness 
of cases, and a set of probabilities describing case ascertainment. This novel approach was applied to data we 
collected from the early "containment phase" stage of the epidemic in Birmingham, England. We also conducted a 
comprehensive review of studies of household transmission of influenza A(H1 N1)pdm09. 

Results: We find large variability in the published estimates of within-household transmissibility of influenza A 
(H1N1)pdm09 in both model-based studies and those reporting secondary attack rates, finding that these 
estimates are very sensitive to how an infected case is defined. In particular, we find that reliance on laboratory 
confirmation alone underestimates the true number of cases, while utilising the heterogeneous range of outcomes 
(based on case definitions) for household infections allows a far more comprehensive pattern of transmission to be 
elucidated. 

Conclusions: Differences in household sizes and how cases are defined could account for an appreciable 
proportion of the reported variability of within-household transmissibility of influenza A(H1 N1)pdm09. Retaining 
and statistically analysing the full spectrum of individual-level outcomes (based on case definitions) rather than 
taking a potentially arbitrary threshold for infection, provides much-needed additional information. In a future 
pandemic, our approach could be used as a real-time analysis tool to infer the true number of cases, within- 
household transmission rates and levels of case ascertainment. 
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Introduction 

Emerging infectious diseases remain an ongoing and ser- 
ious threat to human health. Determining the appropriate 
and measured response to any new threat is often guided 
by mathematical models, which critically depend on good 
estimates of key epidemiological parameters, such as 
transmission rates, case ascertainment and case severity. 
For respiratory pathogens such as influenza, the potential 
for a global pandemic is always present, however the early 
estimation of how virulent and transmissible a given 
organism may be remains extremely difficult. This is pri- 
marily because mild cases do not typically present them- 
selves to the public health system, and so there is always 
the possibility that severe cases will be considered more 
typical than they actually are. The UK Department of 
Health currently emphasises the need to ascertain severity 
of a novel pandemic as soon as possible [1]. Uncertainty 
regarding the severity of the recent 2009 H1N1 pandemic, 
which was relatively mild in most cases compared to pre- 
vious pandemics such as that in 1918-19, was a key pro- 
blem for early efforts to estimate the epidemiological 
quantities necessary to inform public health policy [2,3]. 

Household data has formed a key part of efforts to 
estimate quantities relevant to the transmission 
dynamics of pandemic influenza [4,5]. The household is 
a natural unit for collection of epidemiological data for 
three main reasons. First, by definition members of a 
household are co-located and so are readily studied at 
the same time. Secondly, the close contacts between 
household members often lead to strong within-house- 
hold transmission that provides rich information for sta- 
tistical outbreak analysis. Finally, many interventions 
such as antiviral prophylaxis, treatment and isolation 
advice are often considered for targeting at the house- 
hold level [6,7]. Households therefore form epidemiolo- 
gically important units that are convenient to sample. 

In this study, we use data on a large number of house- 
holds (424) in Birmingham (England's second city and 
an early hotspot of the epidemic) affected during the 
first seven weeks of the 2009 H1N1 pandemic, to esti- 
mate within-household transmissibility, heterogeneity in 
infectiousness of cases, and the accuracy (given by a set 
of four probabilities) of case ascertainment. 

We also carried out a comprehensive review of house- 
hold-based studies, which were typically undertaken 
early in the pandemic. The overwhelming majority of 
these did not estimate transmission probabilities 
between individuals, but instead reported crude second- 
ary attack rates (SARs, see Additional file 1: Literature 
Review for a formal definition) amongst household con- 
tacts of initially detected individuals. While this 
approach is natural in the context of an emerging and 
rapidly growing pandemic, accurate estimation of trans- 
mission intensity allows more general conclusions to be 



obtained, which can inform public health management 
strategies. This is because SARs arise as a result of inter- 
action between the biological process of transmission 
and the socio-demographic structure of a population. 
Estimation of the transmission probabilities indepen- 
dently of the demographic structure, as presented here, 
therefore allows more general conclusions to be drawn. 

The feature that sets our study apart from previous 
work is the combined use of multiple case definitions. 
In the majority of situations, public-health investigations 
of household infections record many observations about 
the individuals' health and symptoms. Then this rich 
information is generally converted to a binary outcome 
(infected or not) according to a strict case definition, 
with such definitions typically varying between different 
public-health bodies. Here we develop a methodology 
that can be applied whenever plausible case definitions 
form a nested hierarchy as shown in Figure 1, and show 
how retaining the individual-level heterogeneous data 
allows us to compute the likely true infections and the 
errors associated with different case definitions. 

Methods 

We now present the protocols used to obtain data dur- 
ing the early phase of the 2009 pandemic. The key fea- 
tures of these protocols will apply in many different 
outbreak scenarios. These protocols motivated our sta- 
tistical methodology, which is also described below. 

Data collection 

During the initial phase of the pandemic in 2009 in the 
UK, suspected cases of influenza A(HlNl)pdm09 were 
reported to the Health Protection Agency (HPA) by 
general practitioners. Individuals meeting a strict case 
definition were classified as 'possible' cases of influenza 
A(HlNl)pdm09 [8]. The case definition included pre- 
sence of fever or history of fever, and two other speci- 
fied flu-like symptoms or other severe illness consistent 
with influenza infection. Epidemiological criteria were 
included in the case definition in addition to clinical cri- 
teria and related to recent travel to high incidence areas 
and contact with other laboratory confirmed cases of 
influenza A or influenza A(HlNl)pdm09 infection. Pos- 
sible cases were prescribed antiviral treatment and had 
nasal and/or throat swabs taken in order to confirm the 
diagnosis, which was done using real-time PCR (RT- 
PCR) methods. All household contacts of laboratory- 
confirmed cases were subsequently prescribed antiviral 
prophylaxis, and any symptomatic contacts meeting the 
case definition were managed as 'possible' cases. 
Detailed demographic and clinical information regarding 
suspected cases and the household contacts of labora- 
tory-confirmed cases was collected by the HPA during 
this early period, known as the containment phase. 
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Figure 1 Non-mathematical explanation of the method used. Risk pyramids are shown on the left and households on the right. Top: If cases 
and non-cases are straightforwardly ascertained, then within-household transmission will tend to cluster the cases so that there are either many 
or few cases within a household. Bottom: The presence of further stratification complicates the picture if information is only available on one 
outcome, but if full information is available, the clustering of cases by household is still visible and gives an accurate picture of transmission. 
Case ascertainment is not explicitly represented in this cartoon, but the principle is similar. 



In Birmingham, BADGER (Birmingham and District 
General Practitioner Emergency Rooms - a cooperative 
of local general practitioners) which currently provides 
out-of-hours primary care services in the city, were 
commissioned by the local primary care trusts (health 
administrative organizations) to set up the Birmingham 
Flu Service. The staff was initially commissioned to 
undertake testing for H1N1 of patients referred by the 
HPA with likely symptoms. Subsequently they under- 
took the case management of possible cases of H1N1 
and their household contacts (in this case using the 
standard definition of households as individuals sharing 
living arrangements) including taking nose/throat swab 
samples and providing antiviral prophylaxis and treat- 
ment. This testing involved all suspected cases of H1N1 
(and their household contacts) notified to the HPA by 
all general practitioners (family doctors) within the Bir- 
mingham Primary Care Trust area, and as such should 



sample representatively the H1N1 cases in general 
population of Birmingham, although there is the possibi- 
lity of differential reporting by population subgroups. 
The first laboratory-confirmed case seen by the clinic, 
with a documented illness onset date, reported becom- 
ing unwell on 5 May 2009. The West Midlands region, 
which covers a broad area of central England consisting 
of both rural and urban areas (including Birmingham) 
had its first reported case on 30 April 2009. 

On 19 June 2009 several postcode areas in Birming- 
ham were identified as 'hotspots' or areas of sustained 
community transmission, and on 23 June 2009 Birming- 
ham as a whole was declared a hotspot. Alternative 
management strategies were adopted from 19 June 2009 
onwards: individuals were treated with antivirals on the 
basis of clinical suspicion rather than laboratory confir- 
mation if they were contacts of a confirmed case. From 
26 June 2009 onwards swabbing and contact tracing 
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ceased in hotspot areas and individuals were treated on 
the basis of clinical suspicion alone. The data presented 
here are based on information collected by the BADGER 
Flu Clinic and the HPA regarding the initial laboratory- 
confirmed cases, and their household contacts, seen by 
clinic staff. Data include index cases with reported dates 
of illness onset and antiviral treatment commencement 
(where reported) between 5 May 2009 and 18 June 
2009, before the change in management strategy was 
announced. Index cases without illness onset dates are 
also included in the analysis, 45 of whom may or may 
not have had dates of onset within the above-defined 
timeframe. There will also have been a number of con- 
tacts who would have been identified after 18 June 
2009. These cases and contacts may have, therefore, 
been treated on the basis of clinical suspicion, rather 
than swabbed. Information regarding 424 initial cases 
(in 424 separate households) and their 1,612 household 
contacts was used to generate the data for analysis. Spe- 
cific information extracted for the purposes of this study 
included broad postcode area of the households, details 
regarding presence of symptoms in contacts, whether 
swabs had been taken (that is, the individual met the 
case definition) and associated laboratory results. 

Statistical analysis 

Our data involve counts of the number of cases in house- 
holds of different sizes. Appropriate probabilistic models 
for such 'final-size' data were presented in a paper by Ball 
[9]. To deal with these mathematically sophisticated 
models, we make use of Bayesian Markov Chain-Monte 
Carlo for statistical analysis [10], which was proposed as 
a method for dealing with household final-size epidemic 
data by O'Neill and Roberts [11]. To calculate the likeli- 
hood requires two pieces of information: the actual num- 
ber of cases in each household, and a distribution 
describing the population-level heterogeneity in infec- 
tiousness. For the latter of these, we use a Gamma distri- 
bution as a simple, parametric choice. For the former, we 
have no 'gold standard' test that gives the actual number 
of cases; instead, for each household we have: 

n Household size. 

A3 Symptomatic individuals with at least one symp- 
tom suggestive of an acute respiratory infection 
(ARI). 

k 2 Swabbed individuals, who should have met the 
case definition of fever and two or more other symp- 
toms according to the HPA algorithm. 

Laboratory-confirmed cases, where PCR testing 
of the swab returned a positive result. 

Note that for any individual household, ki < k 2 < k 3 < n. 
The histograms showing these data are plotted in Figure 2. 



The intuition behind our approach is given in Figure 1: 
within-household transmission gives a distinctive, clus- 
tered, pattern to the distribution of cases in households 
that can be broken by stratification of cases. Use of full 
information does, however, allow accurate epidemiological 
information to be obtained. Our full methodology is quite 
technical and is detailed in Additional file 2: Technical 
Appendix. The quantities that we estimate (in statistical 
language, our model parameters) are, however, straightfor- 
ward to interpret: 

T n Probability of transmission between an infectious 
and a susceptible individual in a household of size n 
(defined unambiguously in Additional file 2). 
8 Variance in infectiousness of HlNlpdm09 cases. 
p Probability that a swab of an HlNlpdm09 case 
does not return positive. 

q Probability that a symptomatic HlNlpdm09 case 
is not swabbed. 

r Probability that a symptomatic non-HlNlpdm09 
individual is swabbed. 

s Probability that a non-HlNlpdm09 individual has 
symptoms - this is essentially the baseline prevalence 
of symptoms indicative of non-HlNlpdm09 ARI. 

Perfect case ascertainment therefore corresponds to the 
situation where p and q are both zero, and it is obviously 
also desirable for r and s to be zero. We check that our 
methodology arrives at accurate parameter values for sev- 
eral simulated datasets with different parameter values in 
Additional file 2. 

For the recent pandemic, we have made the assumptions 
that case ascertainment through swabbing and laboratory 
confirmation does not lead to false positives, and that to 
acquire immunity or transmit infection individuals must 
be symptomatic. In terms of the latter assumption, while 
there is evidence for asymptomatic seroconversion 
[12-14], our definition of symptomatic cases is particularly 
inclusive, and could include, for example, individuals with 
only a sore throat and no fever. Whether seroconversion 
is possible without even extremely mild symptoms, and if 
it is, the implications of this for susceptibility and trans- 
missibility, remain unclear. Significant completely asymp- 
tomatic acquisition of full immunity would, however, 
require a different model from that adopted here. We 
assume that for the nine cases that had a positive swab 
result but no record of symptoms, there actually were 
symptoms that were not recorded - asymptomatic indivi- 
duals were not supposed to be swabbed, and since this is a 
small proportion of the sample any assumption made does 
not substantially influence the final results. A further 
potential source of bias would be other co-circulating 
respiratory pathogens; however, the expected level of these 
in Birmingham during late Spring and early Summer is 
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Figure 2 Final size data. The final size data (that is, frequency distribution of total number of cases by household size) for different 
classifications of cases: Laboratory confirmation through PCR; being swabbed (most likely due to meeting HPA ILI diagnostic criteria); having any 
ARI symptoms; and inferred in the statistical model. Histograms are stratified by household size. ARI, acute respiratory infection; HPA, Health 
Protection Agency; ILI, Influenza-like illness. 



very low, and so for our study period this is unlikely to 
have been important [15,16]. 

Results and discussion 

There are three main sets of results from our statistical 
analysis: estimates of transmission probabilities stratified 
by household size; an estimate of the population-level het- 
erogeneity of infectiousness; and estimates of probabilities 
describing case ascertainment. 

Figure 3 shows the inferred transmission probabilities 
for the full model, which includes the effects of imper- 
fect case ascertainment (black circles). Also shown are 
the results that would have been obtained using differ- 
ent case definitions: symptomatic individuals (blue 
upwards-facing triangles) swabbed individuals (green 
squares) and individuals with a positive swab result (red 
downwards-facing triangles). Each of these three defini- 
tions has been used previously by other researchers; 
while some studies presented results using different case 
definitions, our work is unique in combining different 
definitions with a model of case ascertainment to pro- 
vide an explicit estimate of transmission probability. 



Using the full model, we arrive at a 'true' SAR of 39.7 
[34.9,44.0]%, compared to 16.0[13.4,18.7]% for PCR, 35.2 
[31.4,39.1]% for influenza-like illness (ILI), and 51.9 
[47.5,56.4]% for ARI. 

The variance in infectiousness of cases is shown in the 
left-hand panel of Figure 4. The interpretation of this het- 
erogeneity parameter, 6, is in general quite technical; but if 
it takes the value zero then each case has exactly the same 
infectiousness, while variability increases with 0 so that if 
its value is close to one then the top 10% of cases are 
more than twenty times as infectious over the total course 
of their infection as the bottom 10%. While there is a lot 
of uncertainty in our estimate of 0, all of the values in the 
95% CI represent significant population-level heterogene- 
ity in infectiousness. 

The probabilities relevant to case ascertainment are 
shown in the right-hand panel of Figure 4. We consider 
these in turn. Our estimates are that for cases of influenza 
A(HlNl)pdm09, 27[24,30]% were not swabbed, and of 
those that were 42[38,47]% did not return a positive swab. 
This is qualitatively consistent with the serological work of 
Miller et al. [17], although differences in study population 
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Figure 3 Transmission probabilities. Transmission probabilities for different household sizes by three different case definitions and inferred in 
statistical model, with point estimates and 95% CI shown. 



means that a quantitative comparison cannot be made. 
Our estimate for the baseline attack rate with non- 
HlNlpdm09 ARI over the relevant time period is 3 
[0.3,7]%, which is again broadly consistent with other 



work [15,16]. Concerning non-HlNlpdm09 cases of ARI, 
our estimate of the proportion swabbed contains too 
much uncertainty to inform policy on the basis of the 
dataset considered. 
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Finally, in addition to the original data, Figure 2 shows 
the inferred distribution of HlNlpdm09 final size-prob- 
abilities (black circles). An important feature of these dis- 
tributions is that they are often bimodal rather than 
unimodal, that is they look more like the letter 'u' than the 
letter 'n'. It is this unusual shape that allows for more 
information to be extracted from household final-size data 
than is available for crude population-level estimates of 
prevalence and incidence without household stratification. 

Conclusions 

In this study, we have analyzed the household-stratified 
early infection patterns for pandemic influenza in inner- 
city Birmingham, UK. We have used modern computa- 
tionally-intensive statistical methods to fit a realistic 
model for transmission, and our comprehensive litera- 
ture search [see Additional file 1] indicated that our 
approach to modelling the case ascertainment of influ- 
enza A(HlNl)pdm09 is novel and provides valuable 
additional information. 

Three key conclusions can be drawn from our work. 
First, the level of within-household transmission can be 
estimated directly, despite difficulties in case ascertain- 
ment. An estimate of this quantity is important if anti- 
virals are distributed prophylactically to household 
contacts of cases; if there is little transmission within 
the household then such a policy is less likely to be 
effective and vice versa. 

Considering our results, we arrive at a 'rule of thumb' 
for the HlNlpdm09 pandemic that transmissibility lies 
somewhere between what would be predicted from the 
HPA definition of ILI (swabbed) and a less specific report- 
ing of ARI symptoms (symptomatic). Our results therefore 
provide evidence that relying solely on laboratory-con- 
firmed cases is excessively stringent and consistently leads 
to under-estimation of transmission, as would be expected 
from serological work [17,18]. An additional consequence 
of relying on laboratory confirmation is that given this 
case definition, the transmission probabilities do not 
decline swiftly as household size increases, while our full 
model shows a reduction in transmission probability as 
household size increases, as expected, with the exception 
of household size seven (we did not find any direct cause 
for this anomaly). The question of the relative importance 
of large households for epidemic spread remains signifi- 
cant, and while pre-pandemic analysis of seasonal influ- 
enza suggested decline with size [19,20], this was not a 
consistent observation during the pandemic as seen in 
[21] and our literature review [see Additional file 1]. Study 
design may be an important part of variability [21], and 
our results show that case ascertainment is also relevant. 

As part of estimating the transmission process, we 
also calculated the probability of a false negative PCR 
result. Forty-two percent of infected cases are estimated 



to have had a negative laboratory test, which has signifi- 
cant public health importance, and may have been 
caused by a combination of a number of factors includ- 
ing: problems encountered with taking the swabs 
[22,23]; swabbing individuals who were not in the early 
stages of their illness [24]; and potentially swabbing 
individuals with milder forms of illness. Exploring these 
factors could also be the focus of future work. 

Secondly, our analysis provides additional support for 
the picture of the recent influenza pandemic as one 
with highly variable clinical outcomes, including signifi- 
cant numbers of cases who did not meet the HPA's 
diagnostic criteria, but are likely to have been true cases, 
and a high variance in the infectiousness of cases, that is 
there were many cases who were not particularly infec- 
tious, while a relatively small minority had an extremely 
high probability of passing influenza on to their house- 
hold contacts. 

Finally, and most significantly, our approach could be 
used in future outbreaks as a rapid complement to sero- 
logical work. Serology provides an important indepen- 
dent test of clinical surveillance methods, but is costly 
and the correct epidemiological interpretation of an 
individual's titre is not always clear. Our methods are 
inexpensive and model the epidemiology of disease 
transmission directly, giving the potential for an early 
snapshot of the proportion of cases ascertained. 

While we have given certain questions priority in our 
analysis, as is unavoidable, there are factors that were not 
captured in our model. We believe that the stratification 
of cases by age is the most significant omission from our 
analysis, while other potentially important factors are 
estimation of between-household transmission and the 
efficacy of interventions such as encouragement of perso- 
nal hygiene measures and use of antiviral drugs. In gen- 
eral, inclusion of these additional complexities will lead 
to stratification of our transmission estimates by age, 
time to treatment and prophylaxis and so on, in addition 
to household size, but these may still on average be simi- 
lar to our unstratified estimates. The expectation from 
our literature review would be for lower transmission 
among those given antivirals early and adults than those 
given antivirals late and children, but the often subtle 
effects of transmission dynamics mean that this can only 
be conjectured in the absence of a full analysis. 

Ultimately, our ability to extend the model relies on 
sufficient data being available. Our data are of good qual- 
ity, but still only contain a finite amount of information. 
Furthermore, as highlighted there is some missing data 
on which individuals were managed after 18 June, and 
therefore treated on the basis of clinical suspicion rather 
than swabbed. In our review of the literature on house- 
hold transmission of pandemic influenza [see Additional 
file 1] we found many studies, involving between them 
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several thousand cases and household contacts, that pro- 
duced relevant data. Of these, only a small fraction fitted 
a transmission model to extract generalizable epidemio- 
logical conclusions. 

The current UK Influenza Preparedness Strategy stresses 
the need for rapid research early in a pandemic to improve 
understanding and inform response, and to develop appro- 
priate protocols for such research [1]. We suggest that 
protocols for collection, sharing and meta-analysis of 
household data should form part of this preparedness. The 
data for the studies we found were mostly collected before 
the end of June 2009. Much of the information collected 
during these studies, in particular syndromic information 
and household stratification, was not reported and used at 
the time. An internationally co-ordinated meta-analysis of 
household data during July 2009, fitting transmission para- 
meters and adjusting for case definitions so that meaning- 
ful comparisons could be made across different 
demographic and healthcare contexts, could have provided 
useful information about the pandemic at relatively low 
cost. In particular, these estimates of disease transmission 
could be used in a timely fashion to guide changes in pub- 
lic health management strategies, which in England in 
2009 were made only in areas where there was evidence of 
sustained community transmission. 

Additional material 



Additional file 1: Literature Review. PDF containing the literature 
review. 

Additional file 2: Technical Appendix. PDF containing the technical 
background to the work. 
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